SlideShare a Scribd company logo
1 of 16
Download to read offline
On Empirical Sentiment Accuracy Bounds
     Shawn Rutledge, Chief Scientist
Visible’s Sentiment Approach
Visible was one of the
  first Social Media
Monitoring solution in           Algorithms
      the market.                • State of the art                       A sentiment model
                                 • Beyond overhyped NLP                   based on years of
                                                                          labeling social data for
                                 Features                                 enterprises.
                                 • Deep experience                        107+ labels, 105+
                                 • Social NLP & Context
                                                                          topics, 102+
                                                                          enterprises.
                                 Data
                                 • Massive proprietary data




                         Copyright © 2011 Visible. All rights reserved.
Visible’s Sentiment Approach

                            Algorithms
                            • State of the art                       A sentiment model
                            • Beyond overhyped NLP                   based on years of
                                                                     labeling social data for
 We have 10s of
millions of human           Features                                 enterprises.
annotated social            • Deep experience                        107+ labels, 105+
   media posts              • Social NLP & Context
                                                                     topics, 102+
                                                                     enterprises.
                            Data
                            • Massive proprietary data




                    Copyright © 2011 Visible. All rights reserved.
Visible’s Sentiment Approach

                                  Algorithms
                                  • State of the art                       A sentiment model
                                  • Beyond overhyped NLP                   based on years of
                                                                           labeling social data for
                                  Features                                 enterprises.
                                  • Deep experience                        107+ labels, 105+
                                  • Social NLP & Context
                                                                           topics, 102+
                                                                           enterprises.
  Basically all break-            Data
through in the last two           • Massive proprietary data
 decades have come
   from better data




                          Copyright © 2011 Visible. All rights reserved.
Sentiment, The Accuracy Disconnect
• Claims: “We have 97%
  Accuracy”                                    There is a disconnect
                                             between the hype and the
                                                 experience in the
• Experience: “The best                            marketplace
  vendor tested had 50%
  accuracy at the post
  level”

• Experience: Sentiment
  Accuracy most
  dissatisfying feature
  according to Forrester
  research, only 45%
  satisfied with vendor
  sentiment accuracy


                           Copyright © 2011 Visible. All rights reserved.
Key Findings                               After spending several years of
                                         research with the best available data,
                                           here are some of the key findings.


1. Solve relevance first, sentiment second.

2. Accuracy is the wrong measure to
   optimize.

3. Sentiment is more subjective than
   you think it is.

           Copyright © 2011 Visible. All rights reserved.
Key Findings
1. Solve relevance first, sentiment second.

2. Accuracy is the wrong measure to
   optimize.
     We won’t have time to cover the first two. The
      third could be an alternate title for this talk.


3. Sentiment is more subjective than
   you think it is.
                Copyright © 2011 Visible. All rights reserved.
Audit Findings, Large Financial Institution
   A typical study.

 Double Blind, Multi-Reviewer Study:

1. Same posts labeled by both human                                     No statistically significant
   labeling practice and automation.
                                                                       difference between human
2. At least two auditors grade each
   label. Blind to label source.                                          labeled and AI labeled
                                                                                sentiment


                            Reviewers can’t tell the
                         difference between Visible’s
                        statistical models and human
                                  annotators.




                      Copyright © 2011 Visible. All rights reserved.
Audit Findings, Large Financial Institution
Double Blind, Multi-Reviewer Study:


1. Same posts labeled by both human                                     No statistically significant
   labeling practice and automation.
                                                                       difference between human
2. At least two auditors grade each
   label. Blind toSo is Sentiment “solved”?
                   label source.                                          labeled and AI labeled
                                                                                sentiment
              But…

Auditors agree with each other only 73% of the time
   [95%CI: 69%-77%].                                 No, Auditors think people and
                                                  automation are both poor. And they
                                                      don’t agree with each other.


                      Copyright © 2011 Visible. All rights reserved.
Key Audit Findings, Large Financial Institution
        Social Media Professionals Grading Human Annotations
                Another way of looking at the same study




    Both auditors                                                     At least one
     agree with                                                      auditor agrees
   label only 58%                                                    with label 91%
     of the time                                                      of the time
Proxy for                                                                     Proxy for
 “hard”                                                                        “easy”
graders                                                                       graders


                                58% - 91% is a huge range.


                    Copyright © 2011 Visible. All rights reserved.
True Across a Wide Variety of Problems
  This talk      Multi-Reviewer 3rd party audits across a
  promised        variety of Brands consistently show
 bounds and
                    relatively low agreement rates.
here they are.

About 81% Inter-Annotator Agreement
                               [IQR: 78% - 83%]




                     Copyright © 2011 Visible. All rights reserved.
True Across a Wide Variety of Problems
       Multi-Reviewer 3rd party audits across a
        variety of Brands consistently show
          relatively low agreement rates.

About 81% Inter-Annotator Agreement
                     [IQR: 78% - 83%]

                                80% is also consistent
                                with academic research




           Copyright © 2011 Visible. All rights reserved.
Take Aways
1. Yes, your team
2. Evaluating sentiment takes care
3. Accuracy claims inbetter than average drivers.
           We all think we’re
                               the 90s are either exaggerated
   or naïve (over-fit) of us have heard something like the
    Similarly, although most
    80% agreement statistic, we don’t think it applies to us. The
4. It main thing I want you totake away from this talk istight
      will take effort to get your team in that it
   agreement in the People withinyou, disagree with your
      does apply to you.
       team, sitting
                     on sentiment your department, you
                          cube next to
                                       definitions
5. Real breakthroughs inofsentiment accuracy will
                        about 20% the time.

   come from personalization




                 Copyright © 2011 Visible. All rights reserved.
Take Aways
1. Yes, your team
2. Evaluating sentiment takes care
3. Accuracy claims in the 90s are either exaggerated
   or naïve (over-fit)
4. It will The implicationsto get yourtaking in tight
           take effort are also worth team
   agreement When people claim accuracies
          to heart. on sentiment definitions
           much higher than 80% they are either
5. Real breakthroughs in sentiment accuracy will
           lying or they don’t know what they are
   come from personalization .
               doing (overfit to one dataset)




              Copyright © 2011 Visible. All rights reserved.
Take Aways
1. Yes, your what has happened in Search, real breakthroughs will come
        Similar to team
      though personalization. Deeper linguistics (dealing with sarcasm, humor,
2. Evaluating sentiment takesbut can’t help break the 80% barrier.
     contextual knowledge) are interesting care

3. Accuracythe work into getting90s are either exaggerated (with
    If teams put claims in the tight, consistent sentiment definitions
   or naïve (over-fit) then do algorithms have a chance to do that well.
      >80% agreement), only

4. It will take effort to get your team in tight
   agreement on sentiment definitions
5. Real breakthroughs in sentiment accuracy will
   come from personalization




                   Copyright © 2011 Visible. All rights reserved.
@shawnrut


           @Visible
   VisibleTechnologies.com




Thank You!

More Related Content

What's hot

Thinking [Better] About the Future
Thinking [Better] About the Future Thinking [Better] About the Future
Thinking [Better] About the Future IABC Houston
 
Midwest km pugh conversational ai and ai for conversation 190809
Midwest km pugh conversational ai and ai for conversation 190809Midwest km pugh conversational ai and ai for conversation 190809
Midwest km pugh conversational ai and ai for conversation 190809Katrina (Kate) Pugh
 
Assessing hearing capacity and hearing skill in infants 2
Assessing hearing capacity and hearing skill in infants 2Assessing hearing capacity and hearing skill in infants 2
Assessing hearing capacity and hearing skill in infants 2aboothroydgm
 
Florida Memory Project and Usability
Florida Memory Project and UsabilityFlorida Memory Project and Usability
Florida Memory Project and UsabilityFlorence Paisey
 
ExactTarget & Crown Audience Builder
ExactTarget & Crown Audience BuilderExactTarget & Crown Audience Builder
ExactTarget & Crown Audience BuilderCrown
 
Gerald.mulenburg
Gerald.mulenburgGerald.mulenburg
Gerald.mulenburgNASAPMC
 
Operating in a connected world and the power of doing
Operating in a connected world and the power of doingOperating in a connected world and the power of doing
Operating in a connected world and the power of doingMartin Bailie
 
Communication And Connectnedness B A World V2
Communication And  Connectnedness  B A  World V2Communication And  Connectnedness  B A  World V2
Communication And Connectnedness B A World V2Mia Horrigan
 

What's hot (9)

Thinking [Better] About the Future
Thinking [Better] About the Future Thinking [Better] About the Future
Thinking [Better] About the Future
 
Midwest km pugh conversational ai and ai for conversation 190809
Midwest km pugh conversational ai and ai for conversation 190809Midwest km pugh conversational ai and ai for conversation 190809
Midwest km pugh conversational ai and ai for conversation 190809
 
Assessing hearing capacity and hearing skill in infants 2
Assessing hearing capacity and hearing skill in infants 2Assessing hearing capacity and hearing skill in infants 2
Assessing hearing capacity and hearing skill in infants 2
 
Florida Memory Project and Usability
Florida Memory Project and UsabilityFlorida Memory Project and Usability
Florida Memory Project and Usability
 
ExactTarget & Crown Audience Builder
ExactTarget & Crown Audience BuilderExactTarget & Crown Audience Builder
ExactTarget & Crown Audience Builder
 
Mfilsecker engagement and educational games
Mfilsecker engagement and educational gamesMfilsecker engagement and educational games
Mfilsecker engagement and educational games
 
Gerald.mulenburg
Gerald.mulenburgGerald.mulenburg
Gerald.mulenburg
 
Operating in a connected world and the power of doing
Operating in a connected world and the power of doingOperating in a connected world and the power of doing
Operating in a connected world and the power of doing
 
Communication And Connectnedness B A World V2
Communication And  Connectnedness  B A  World V2Communication And  Connectnedness  B A  World V2
Communication And Connectnedness B A World V2
 

Similar to Empirical Sentiment Accuracy Bounds

Research uden at stille spørgsmål, Sophie Van Neck, InSites Consulting
Research uden at stille spørgsmål, Sophie Van Neck, InSites ConsultingResearch uden at stille spørgsmål, Sophie Van Neck, InSites Consulting
Research uden at stille spørgsmål, Sophie Van Neck, InSites ConsultingIBM Danmark
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Dave King
 
Goodbye Measurement, Hello Analytics: The Move to "Alw
Goodbye Measurement, Hello Analytics: The Move to "AlwGoodbye Measurement, Hello Analytics: The Move to "Alw
Goodbye Measurement, Hello Analytics: The Move to "AlwTim Marklein
 
iMedia March Brand Summit: Enabling the Social Workforce
iMedia March Brand Summit: Enabling the Social WorkforceiMedia March Brand Summit: Enabling the Social Workforce
iMedia March Brand Summit: Enabling the Social WorkforceiMedia Connection
 
Social analytics apr24'12_marklein-1
Social analytics apr24'12_marklein-1Social analytics apr24'12_marklein-1
Social analytics apr24'12_marklein-1ronpiovesan
 
Social media mining hicss 46 part 2
Social media mining   hicss 46 part 2Social media mining   hicss 46 part 2
Social media mining hicss 46 part 2Dave King
 
Prediktiv analys och kundlojalitet
Prediktiv analys och kundlojalitetPrediktiv analys och kundlojalitet
Prediktiv analys och kundlojalitetIBM Sverige
 
Making Business Human: Delivering Great Experiences in a Connected Age
Making Business Human: Delivering Great Experiences in a Connected AgeMaking Business Human: Delivering Great Experiences in a Connected Age
Making Business Human: Delivering Great Experiences in a Connected AgePeter Merholz
 
Human Impact on Information Security - Computer Society of India Conference, ...
Human Impact on Information Security - Computer Society of India Conference, ...Human Impact on Information Security - Computer Society of India Conference, ...
Human Impact on Information Security - Computer Society of India Conference, ...Anup Narayanan
 
EmPower PRSA, 2012 - Analytics & Influencers
EmPower PRSA, 2012 - Analytics & InfluencersEmPower PRSA, 2012 - Analytics & Influencers
EmPower PRSA, 2012 - Analytics & Influencersjoerhoton
 
Osimo crossover md
Osimo crossover mdOsimo crossover md
Osimo crossover mdosimod
 
Trager gaining insights workshop_csw_10-9-12
Trager gaining insights workshop_csw_10-9-12Trager gaining insights workshop_csw_10-9-12
Trager gaining insights workshop_csw_10-9-12Lisa Trager
 
SemTech 2012 - Making your semantic app addictive: Incentivizing Users
SemTech 2012 - Making your semantic app addictive: Incentivizing UsersSemTech 2012 - Making your semantic app addictive: Incentivizing Users
SemTech 2012 - Making your semantic app addictive: Incentivizing UsersINSEMTIVES project
 
7 Steps to Thought Leadership
7 Steps to Thought Leadership7 Steps to Thought Leadership
7 Steps to Thought LeadershipRegalix
 
Using Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEOUsing Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEOOptify
 
Effective simplicity rotterdam
Effective simplicity rotterdamEffective simplicity rotterdam
Effective simplicity rotterdamsaskiamenkel
 
Proving the business value of social media
Proving the business value of social media Proving the business value of social media
Proving the business value of social media Blackbaud Pacific
 
Introduction to Trufflenet for local government
Introduction to Trufflenet for local governmentIntroduction to Trufflenet for local government
Introduction to Trufflenet for local governmenttrufflenet
 

Similar to Empirical Sentiment Accuracy Bounds (20)

Research uden at stille spørgsmål, Sophie Van Neck, InSites Consulting
Research uden at stille spørgsmål, Sophie Van Neck, InSites ConsultingResearch uden at stille spørgsmål, Sophie Van Neck, InSites Consulting
Research uden at stille spørgsmål, Sophie Van Neck, InSites Consulting
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2
 
Goodbye Measurement, Hello Analytics: The Move to "Alw
Goodbye Measurement, Hello Analytics: The Move to "AlwGoodbye Measurement, Hello Analytics: The Move to "Alw
Goodbye Measurement, Hello Analytics: The Move to "Alw
 
iMedia March Brand Summit: Enabling the Social Workforce
iMedia March Brand Summit: Enabling the Social WorkforceiMedia March Brand Summit: Enabling the Social Workforce
iMedia March Brand Summit: Enabling the Social Workforce
 
Social analytics apr24'12_marklein-1
Social analytics apr24'12_marklein-1Social analytics apr24'12_marklein-1
Social analytics apr24'12_marklein-1
 
Social media mining hicss 46 part 2
Social media mining   hicss 46 part 2Social media mining   hicss 46 part 2
Social media mining hicss 46 part 2
 
Prediktiv analys och kundlojalitet
Prediktiv analys och kundlojalitetPrediktiv analys och kundlojalitet
Prediktiv analys och kundlojalitet
 
Making Business Human: Delivering Great Experiences in a Connected Age
Making Business Human: Delivering Great Experiences in a Connected AgeMaking Business Human: Delivering Great Experiences in a Connected Age
Making Business Human: Delivering Great Experiences in a Connected Age
 
Human Impact on Information Security - Computer Society of India Conference, ...
Human Impact on Information Security - Computer Society of India Conference, ...Human Impact on Information Security - Computer Society of India Conference, ...
Human Impact on Information Security - Computer Society of India Conference, ...
 
EmPower PRSA, 2012 - Analytics & Influencers
EmPower PRSA, 2012 - Analytics & InfluencersEmPower PRSA, 2012 - Analytics & Influencers
EmPower PRSA, 2012 - Analytics & Influencers
 
Osimo crossover md
Osimo crossover mdOsimo crossover md
Osimo crossover md
 
Listening Tools
Listening ToolsListening Tools
Listening Tools
 
Communicating using our strengths
Communicating using our strengthsCommunicating using our strengths
Communicating using our strengths
 
Trager gaining insights workshop_csw_10-9-12
Trager gaining insights workshop_csw_10-9-12Trager gaining insights workshop_csw_10-9-12
Trager gaining insights workshop_csw_10-9-12
 
SemTech 2012 - Making your semantic app addictive: Incentivizing Users
SemTech 2012 - Making your semantic app addictive: Incentivizing UsersSemTech 2012 - Making your semantic app addictive: Incentivizing Users
SemTech 2012 - Making your semantic app addictive: Incentivizing Users
 
7 Steps to Thought Leadership
7 Steps to Thought Leadership7 Steps to Thought Leadership
7 Steps to Thought Leadership
 
Using Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEOUsing Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEO
 
Effective simplicity rotterdam
Effective simplicity rotterdamEffective simplicity rotterdam
Effective simplicity rotterdam
 
Proving the business value of social media
Proving the business value of social media Proving the business value of social media
Proving the business value of social media
 
Introduction to Trufflenet for local government
Introduction to Trufflenet for local governmentIntroduction to Trufflenet for local government
Introduction to Trufflenet for local government
 

More from Visible Technologies

The Future of Social Influence in a Social Capital World
The Future of Social Influence in a Social Capital WorldThe Future of Social Influence in a Social Capital World
The Future of Social Influence in a Social Capital WorldVisible Technologies
 
The Social Web. Why Brands Must Listen, Measure and Act v2.0
The Social Web. Why Brands Must Listen, Measure and Act v2.0The Social Web. Why Brands Must Listen, Measure and Act v2.0
The Social Web. Why Brands Must Listen, Measure and Act v2.0Visible Technologies
 
Interacting with Social Media to Strengthen Communication Strategies
Interacting with Social Media to Strengthen Communication StrategiesInteracting with Social Media to Strengthen Communication Strategies
Interacting with Social Media to Strengthen Communication StrategiesVisible Technologies
 

More from Visible Technologies (7)

The Future of Social Influence in a Social Capital World
The Future of Social Influence in a Social Capital WorldThe Future of Social Influence in a Social Capital World
The Future of Social Influence in a Social Capital World
 
Ama Webcast 2.17.09
Ama Webcast 2.17.09Ama Webcast 2.17.09
Ama Webcast 2.17.09
 
The Social Web. Why Brands Must Listen, Measure and Act v2.0
The Social Web. Why Brands Must Listen, Measure and Act v2.0The Social Web. Why Brands Must Listen, Measure and Act v2.0
The Social Web. Why Brands Must Listen, Measure and Act v2.0
 
Interacting with Social Media to Strengthen Communication Strategies
Interacting with Social Media to Strengthen Communication StrategiesInteracting with Social Media to Strengthen Communication Strategies
Interacting with Social Media to Strengthen Communication Strategies
 
Ama Webcast 5.22.08
Ama Webcast 5.22.08Ama Webcast 5.22.08
Ama Webcast 5.22.08
 
Tmobile Engadget Case Study
Tmobile Engadget Case StudyTmobile Engadget Case Study
Tmobile Engadget Case Study
 
New Realities
New RealitiesNew Realities
New Realities
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Empirical Sentiment Accuracy Bounds

  • 1. On Empirical Sentiment Accuracy Bounds Shawn Rutledge, Chief Scientist
  • 2. Visible’s Sentiment Approach Visible was one of the first Social Media Monitoring solution in Algorithms the market. • State of the art A sentiment model • Beyond overhyped NLP based on years of labeling social data for Features enterprises. • Deep experience 107+ labels, 105+ • Social NLP & Context topics, 102+ enterprises. Data • Massive proprietary data Copyright © 2011 Visible. All rights reserved.
  • 3. Visible’s Sentiment Approach Algorithms • State of the art A sentiment model • Beyond overhyped NLP based on years of labeling social data for We have 10s of millions of human Features enterprises. annotated social • Deep experience 107+ labels, 105+ media posts • Social NLP & Context topics, 102+ enterprises. Data • Massive proprietary data Copyright © 2011 Visible. All rights reserved.
  • 4. Visible’s Sentiment Approach Algorithms • State of the art A sentiment model • Beyond overhyped NLP based on years of labeling social data for Features enterprises. • Deep experience 107+ labels, 105+ • Social NLP & Context topics, 102+ enterprises. Basically all break- Data through in the last two • Massive proprietary data decades have come from better data Copyright © 2011 Visible. All rights reserved.
  • 5. Sentiment, The Accuracy Disconnect • Claims: “We have 97% Accuracy” There is a disconnect between the hype and the experience in the • Experience: “The best marketplace vendor tested had 50% accuracy at the post level” • Experience: Sentiment Accuracy most dissatisfying feature according to Forrester research, only 45% satisfied with vendor sentiment accuracy Copyright © 2011 Visible. All rights reserved.
  • 6. Key Findings After spending several years of research with the best available data, here are some of the key findings. 1. Solve relevance first, sentiment second. 2. Accuracy is the wrong measure to optimize. 3. Sentiment is more subjective than you think it is. Copyright © 2011 Visible. All rights reserved.
  • 7. Key Findings 1. Solve relevance first, sentiment second. 2. Accuracy is the wrong measure to optimize. We won’t have time to cover the first two. The third could be an alternate title for this talk. 3. Sentiment is more subjective than you think it is. Copyright © 2011 Visible. All rights reserved.
  • 8. Audit Findings, Large Financial Institution A typical study. Double Blind, Multi-Reviewer Study: 1. Same posts labeled by both human No statistically significant labeling practice and automation. difference between human 2. At least two auditors grade each label. Blind to label source. labeled and AI labeled sentiment Reviewers can’t tell the difference between Visible’s statistical models and human annotators. Copyright © 2011 Visible. All rights reserved.
  • 9. Audit Findings, Large Financial Institution Double Blind, Multi-Reviewer Study: 1. Same posts labeled by both human No statistically significant labeling practice and automation. difference between human 2. At least two auditors grade each label. Blind toSo is Sentiment “solved”? label source. labeled and AI labeled sentiment But… Auditors agree with each other only 73% of the time [95%CI: 69%-77%]. No, Auditors think people and automation are both poor. And they don’t agree with each other. Copyright © 2011 Visible. All rights reserved.
  • 10. Key Audit Findings, Large Financial Institution Social Media Professionals Grading Human Annotations Another way of looking at the same study Both auditors At least one agree with auditor agrees label only 58% with label 91% of the time of the time Proxy for Proxy for “hard” “easy” graders graders 58% - 91% is a huge range. Copyright © 2011 Visible. All rights reserved.
  • 11. True Across a Wide Variety of Problems This talk Multi-Reviewer 3rd party audits across a promised variety of Brands consistently show bounds and relatively low agreement rates. here they are. About 81% Inter-Annotator Agreement [IQR: 78% - 83%] Copyright © 2011 Visible. All rights reserved.
  • 12. True Across a Wide Variety of Problems Multi-Reviewer 3rd party audits across a variety of Brands consistently show relatively low agreement rates. About 81% Inter-Annotator Agreement [IQR: 78% - 83%] 80% is also consistent with academic research Copyright © 2011 Visible. All rights reserved.
  • 13. Take Aways 1. Yes, your team 2. Evaluating sentiment takes care 3. Accuracy claims inbetter than average drivers. We all think we’re the 90s are either exaggerated or naïve (over-fit) of us have heard something like the Similarly, although most 80% agreement statistic, we don’t think it applies to us. The 4. It main thing I want you totake away from this talk istight will take effort to get your team in that it agreement in the People withinyou, disagree with your does apply to you. team, sitting on sentiment your department, you cube next to definitions 5. Real breakthroughs inofsentiment accuracy will about 20% the time. come from personalization Copyright © 2011 Visible. All rights reserved.
  • 14. Take Aways 1. Yes, your team 2. Evaluating sentiment takes care 3. Accuracy claims in the 90s are either exaggerated or naïve (over-fit) 4. It will The implicationsto get yourtaking in tight take effort are also worth team agreement When people claim accuracies to heart. on sentiment definitions much higher than 80% they are either 5. Real breakthroughs in sentiment accuracy will lying or they don’t know what they are come from personalization . doing (overfit to one dataset) Copyright © 2011 Visible. All rights reserved.
  • 15. Take Aways 1. Yes, your what has happened in Search, real breakthroughs will come Similar to team though personalization. Deeper linguistics (dealing with sarcasm, humor, 2. Evaluating sentiment takesbut can’t help break the 80% barrier. contextual knowledge) are interesting care 3. Accuracythe work into getting90s are either exaggerated (with If teams put claims in the tight, consistent sentiment definitions or naïve (over-fit) then do algorithms have a chance to do that well. >80% agreement), only 4. It will take effort to get your team in tight agreement on sentiment definitions 5. Real breakthroughs in sentiment accuracy will come from personalization Copyright © 2011 Visible. All rights reserved.
  • 16. @shawnrut @Visible VisibleTechnologies.com Thank You!